The goal of this exploration is to develop a hierarchical Bayesian model to estimate the true case fatality rate (CFR) for each county. In particular, we will start by taking advantage of the grouping of counties within states. The result of the model will be a “denoised” estimate of the CFR for each county in the country.

The initial motivation for this exploration is to use the distribution of the denoised CFR across counties to estimate to select the shape and scale for a beta prior distribution that will enable the analytic calculation of a denoised posterior CFR for an arbitrary county taking advantage of the conjucacy between a beta prior and a binomial likelihood.

Exploratory plots

Numerical summary of case fatality rates

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.008929 0.018868 0.024723 0.033333 0.312500

Case fatality rates by number of

The most extreme CFRs come from counties with small numbers of cases

Median and IQR of county CFR by state

The appears to be meaningful clustering of CFR within states, which suggests that a model with a random effect for state is appropriate.

Modelling

We fit a binomial model to estimate an adjusted (denoised) CFR for each county by shrinking the county CFR towards the state CFR and shrinking the state CFR towards the national CFR. The prior \(N(0,1.6)\) on the intercept is chosen because this prior on the logit scale is approximately uniform over [0,1] when transformed to the probability scale.

I should note that with these priors, this simple model probably does not require STAN to fit (we could use e.g. glmer). However the stan machinery will be needed if we make the model more complex.

We could in theory obtain more precise estimates by placing a more informative prior on the national CFR, however the gains in precision would likely be small given that there is ample data to estimate the national CFR.

Model with intercept and state and county random effect

Priors

##            prior     class      coef group resp dpar nlpar bound
## 1 normal(0, 1.6) Intercept                                      
## 2   normal(0, 1)        sd                                      
## 3                       sd            fips                      
## 4                       sd Intercept  fips                      
## 5                       sd           state                      
## 6                       sd Intercept state

Fit summary

The variance estimates for the state random effect and the county random effect are roughly the same and are relatively large, suggesting that there is meaningful variation in CFR both within and between states.

##  Family: binomial 
##   Links: mu = logit 
## Formula: deaths | trials(cases) ~ (1 | state) + (1 | fips) 
##    Data: dat (Number of observations: 3129) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~fips (Number of levels: 3129) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.58      0.01     0.56     0.60 1.00      857     1766
## 
## ~state (Number of levels: 51) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.54      0.06     0.44     0.69 1.00      815     1315
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -3.84      0.08    -3.98    -3.68 1.01      361      572
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

National CFR

The adjusted national CFR is essentially the same as the unadjusted national CFR, because there is ample data to estimate this CFR.

cases deaths CFR CFR_adj
6712730 207124 0.03086 0.03086

State CFR

The adjusted state CFRs are very close to the unadjusted state CFRs, because there is ample data to estimate these as well. The regularization on the state random effect may become more important if we do something more complex, such as incorporating a time trend that interacts with state.

County CFR

The adjustment on the county-level CFR is much more meaningful. The many points substantially below the diagonal on this plot indicate counties where the unadjusted CFR is very large but the adjusted (denoised) CFR is much more moderate.

Case fatality rates (adjusted and unadjusted) by number of cases

The relationship between CFR and number of cases looks different after adjustment, in that CFRs for counties with few cases have been shrunken to more moderate values. Interestingly, there is a slight positive relationship (approximately linear on the log scale) between CFR and cases, for both adjusted and unadjusted CFR.

Distribution of CFR by state

Hover over densities to see which state they represent.

Fit beta distribution to overall state-level CFR for use as prior in beta-binomial conjugate adjustment of county CFR at future times

First fit beta distribution to distribution of county-level CFR nationally.

Parameter estimates

shape1 shape2
2.842 111.3

Empirical density versus fitted distribution

Fit a separate distribution per state

state shape1 shape2 n_counties deaths cases mean_CFR mean_CFR_fitted
DE 76.26 2090 3 645 19086 0.0353 0.0352
HI 17.14 934.5 4 152 11375 0.01478 0.01802
RI 3.609 67.5 5 1092 21374 0.05063 0.05075
CT 4.854 60.55 8 4513 55406 0.07417 0.07421
NH 6.622 190.4 10 442 7941 0.02801 0.03361
MA 6.118 73.64 14 9502 133599 0.07335 0.07672
VT 12.35 485.6 14 58 1707 0.01698 0.02481
AZ 11.57 321.6 15 5705 214015 0.03479 0.03473
ME 3.187 86.42 16 142 5075 0.03423 0.03557
NV 19.16 1006 16 1620 75774 0.01301 0.01869
NJ 14.07 153.9 21 16135 199432 0.08393 0.08379
WY 7.769 607.5 23 53 4869 0.01402 0.01263
MD 6.709 195.8 24 3945 120156 0.03274 0.03313
AK 39.01 4271 26 58 6834 0.008358 0.00905
UT 4.641 519.9 29 475 63732 0.008107 0.008848
NM 5.46 194.2 33 890 26215 0.02757 0.02734
OR 11.84 704 35 563 30795 0.01314 0.01654
WA 5.079 254 39 2139 82248 0.01879 0.01961
ID 4.404 301.8 44 480 37488 0.01551 0.01438
SC 8.218 263.2 46 3442 137708 0.0305 0.03028
ND 10.15 634.2 53 271 17954 0.01765 0.01576
MT 9.251 527.9 54 186 10299 0.01492 0.01722
WV 4.799 181.5 55 357 14049 0.02512 0.02576
CA 4.763 269.9 59 16142 785501 0.01604 0.01734
NY 3.034 61.18 62 32773 449900 0.0466 0.04726
CO 6.718 260.1 63 2057 64827 0.02066 0.02517
LA 8.091 231.5 64 5387 160971 0.0335 0.03377
SD 9.98 657.5 66 248 18695 0.0143 0.01495
AL 3.672 172.2 67 2558 144960 0.02134 0.02088
FL 5.477 251.1 67 14628 682155 0.02108 0.02135
PA 4.752 111.4 67 8199 150327 0.03814 0.04092
WI 6.651 616.9 72 1372 107291 0.01089 0.01067
AR 4.291 183.5 75 1407 74032 0.02246 0.02286
OK 6.018 367.8 77 1051 76725 0.01623 0.0161
MS 8.065 218.8 82 3015 93361 0.03566 0.03555
MI 4.574 125.1 83 7043 122119 0.03468 0.03526
MN 3.641 220.9 87 2073 89806 0.01667 0.01622
OH 3.815 99.83 88 4925 144308 0.0367 0.03681
NE 4.214 255.6 91 497 40882 0.01713 0.01622
IN 3.641 108.5 92 3669 113914 0.03172 0.03247
TN 6.45 408.8 95 2525 177686 0.01552 0.01553
IA 3.998 214.6 99 1377 79995 0.01817 0.01829
NC 5.203 231.1 100 3629 193546 0.02215 0.02202
IL 4.9 207.6 102 8774 274198 0.02184 0.02306
KS 6.049 402 105 686 52684 0.01653 0.01483
MO 4.421 297.4 115 2170 112764 0.01404 0.01465
KY 4.551 217.3 120 1205 61515 0.01921 0.02051
VA 3.579 127.9 133 3270 140488 0.02714 0.02722
GA 4.192 129.4 159 6966 286660 0.03196 0.03138
TX 5.7 179 251 15984 701334 0.03382 0.03086

Empirical density versus fitted distribution by state

The number in parentheses after each state indicates the number of counties.

Compare adjusted CFR from model to adjusted CFR computed using empirical Bayes with the state-specific beta prior

Plot adjusted CFR from model versus from EB

Adjusted CFRs from EB are generally lower than those from the model, where the two estimates differ. To understand how similar these two adjusted estimates are relative to the unadjusted CFR, we have to look at the unadjusted CFR as well.

Plot all three CFR for each county (ordered by model adjusted CFR)?

In most cases, the two adjusted CFRs are similar (relative to the unadjusted CFR). The Empirical Bayes adjustment tends to shrink the CFR to slightly lower values than the model adjustment.

One possible explanation for differences between the two adjustment methods is that in the model, there is a single variance for the county random effects across states, while in the EB method, the fitted beta distributions have differing variances by state.

Now plot the adjusted CFR from EB versus the adjusted CFR from the model state

Compare underreporting factors estimated using adjusted and unadjusted CRF

Assuming a true mortality rate of 0.0138. The distribution of estimated underreporting factors is much more heavy-tailed for the unadjusted CFRs compared to the adjusted CFRs.

Write out state priors for use in 19 and Me